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Abstract. We compare two approaches to Ricci curvature on non-smooth spaces, 
in the case of the discrete hypercube {0, 1}^. While the coarse Ricci curvature of the 
first author readily yields a positive value for curvature, the displacement convexity 
property of Lott, Sturm and the second author could not be fully implemented. 
Yet along the way we get new results of a combinatorial and probabilistic nature, 
i-H including a curved Brunn-Minkowski inequality on the discrete hypercube. 
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Let Ao, A\ be two compact, nonempty subsets of IR n . In one of its guises, the 
remarkable Brunn-Minkowski inequality states that 

In vol A t ^ (1 - t) In vol A + tin vol A 1 

where ^ t ^ 1 and A t = {(1 — i)ao + iai, ao G A , a\ G Ax} is the set of t-midpoints 
Q\ between A and A\. In other words, the logarithm of the volume of A t is concave. 

We refer to [Gar02] for a nice survey. This is the "infinite-dimensional" version of 
the Brunn-Minkowski inequality, from which the more common version using 1 / n-th 

, J powers instead of logarithms can be derived (see Eq. (22) in [Gar02]). 

If R n is replaced with a Riemannian manifold, the presence of positive curvature 
improves this inequality. Indeed, in [CMS06] (elaborating on [CMS01]) it is proved 
that if X is a smooth and complete Riemannian manifold with Ricci curvature at 
least K for some K G K, then for any two compact, nonempty subsets A , A x C X, 
we have 

& K 

In vol A t > (l-i)lnvolA) + tlnvolAi + — t(l - t) rf(A ,Ai) 2 . 

Here the set of t-midpoints A t is defined as the set of all ^(t) where 7 is any 
minimizing geodesic such that 7(0) G A and 7(1) G A x . The distance d(A Q ,Ai) is 

infaoGAhaiGAi ^(«0? a l) ■ 
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FIGURE 1. In positive curvature, midpoints spread out. 



(1 — k) d on average 




FIGURE 2. In positive curvature, balls are closer than their centers. 

Actually this kind of inequality has been used as a tentative definition of positive 
Ricci curvature on more general, non-smooth spaces. The idea is that, in positive 
curvature, "midpoints spread out" so that the set of midpoints of two given sets 
is larger than in the reference Euclidean case (Fig. 1). This led to the notion of 
displacement convexity of entropy for Riemannian manifolds [RS05, CMS01, OV00], 
later developed by Sturm [Stu06] and Lott and the second author [LV09]. However, 
it is not clear how this fares for discrete spaces [BS09]. 

Another approach to define the Ricci curvature of discrete spaces is coarse Ricci 
curvature, developed by the first author [O1107, O1109]. The motto is that, in positive 
curvature, "balls are closer than their centers are" in transportation distance (Fig. 2). 

We compare both approaches applied to the discrete hypercube X = {0,1}^. This 
is the most simple discrete space expected to have positive Ricci curvature in some 
sense, for a variety of reasons (see, e.g., paragraph 3 1 -21 "Spheres, cubes, and the law 
of large numbers" in [Gro99]). The subtitle question "What is the Ricci curvature of 
the discrete hypercube?" was asked verbatim by Stroock in a seminar as early as 
1998, in a context of logarithmic Sobolev inequalities. 
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The formalism of coarse Ricci curvature is readily available for the hypercube and 
yields a value of for the Ricci curvature of {0, 1}^ (section 2.1). On the other 
hand, we could not fully implement the displacement convexity of entropy (properly 
discretized) in the hypercube. Yet, along the way, we still get a combinatorial 
Brunn-Minkowski inequality on the hypercube, including a positive curvature term. 
The resulting value of curvature is ~ l/N, compatible with coarse Ricci curvature. 

Acknowledgements: The authors would like to thank Prasad Tetali for helpful 
comments on concentration in the symmetric group, which led to improved constants. 

1. STATEMENT OF RESULTS 

1.1. Brunn— Minkowski inequality in the hypercube. We consider the discrete 
hypercube X := {0, 1}^, iVeN, equipped with the Hamming (or i 1 ) metric 

d((xi), (yi)) ■= Xi ^ y { }. 

For A and B nonempty subsets of X, we define d(A, B) := mi ae A,b&B d(a, b). 

Let a and b be two points in X. A midpoint of a and b is any point m such that 
d(m,a) +d(m,b) = d(a,b) and \d(m,a) — d(a,b)/2\ < 1. More explicitly: if d(a,b) 
is even, a midpoint is the middle point on any shortest path from a to b in X, and if 
d(a, b) is odd, a midpoint is one the two middlemost points on such a shortest path. 
In the hypercube, midpoints are by no means unique: the number of midpoints of a 
and b is the binomial coefficient (jj^^y^) if d(a,b) is even, and 2 (^[^^2) ^ d(a,b) 
is odd. 

If A and B are two subsets of X, the set of midpoints of A and B is the set of 
midpoints of all pairs (a, b) E A x B. 

Theorem 1. Let A and B be two nonempty subsets of {0, 1}^. Let M be the set of 
midpoints of A and B. Then 

In #M > l - In #A + l - In #5 + | d(A, B f 

with K = jjy. 

This is analogous to the curved Brunn-Minkowski inequality above in Riemannian 
manifolds (for t = 1/2), with K playing the role of a curvature lower bound. 

The order of magnitude for K is optimal: indeed, when A and B are singletons 
lying at distance N, then d(A, B) 2 = N 2 , while the number of midpoints is (^ 2 ) ~ 

^ N \ ~7v' so ^ na ^ l n #^ grows linearly in N. 
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We will now see that this theorem can be improved by replacing d(A, B) with a 
transportation distance. 

1.2. Entropy of midpoints in the hypercube. Theorem 1 appears as a particular 
case of a refined statement using probability measures instead of sets. 

Let /i be a probability measure on a discrete set X. Its Shannon entropy is 

■= -^2n(x) \nn(x). 

x&X 

In particular, if /i is the uniform distribution on a finite subset A C X, then 
S(vl) = In #A 

In this paper, we shall also use the relative entropy (or Kullback-Leibler divergence) 
of a measure /i with respect to a reference probability measure v, defined as 

H(jm\u) := 1* £ 0. 

xdX V ' 

If X is finite and the reference measure v is uniform on X, then we have H(fj,\is) = 
\n#X-S(n). 

To state an entropic version of Theorem 1 we define the midpoints of two measures 
as follows. Loosely speaking, we first pick a random point a under /x , then an 
independent random point b under /ii, and finally we pick a random midpoint of a 
and b uniformly over all such midpoints. 

More precisely, let a and b be two points of the hypercube X. The midpoint 
measure mid(a,b) is defined as the uniform probability measure on all midpoints of 
a and b. Let now /io,A*i be two probability measures on X. The midpoint measure 
of Hq and /ii is defined as 

mid(fj,o, Hi) :— J J mid(a, b) d//o(o)d/ii (b). 

Theorem 2. Let /i and jii be two probability measures on the discrete hypercube 
X = {0, 1}^. Let Hi/2 = mid ((J, , /ii) be their midpoint measure. Then 

1 . „, „ K 
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wzi/i K = Equivalently, 



1 X 



8 



mt/i z/ £/ie uniform probability measure on {0, 1} 
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Here we use the L 1 Wasserstein distance 

W x {n,n') :=inf J J d(a,b)dC(a,b) 

where the infimum is taken over all measures (onlxl such that j b d£(a, b) = d/x(a) 
and f a d£(a,b) = d/x'(&), i.e., all couplings of ji and //. We refer to [Vil03] for more 
background on this topic. 

Note that W\(no, fii) is always at least d(A,B) for /xo and /xi supported in sets A 
and B; in particular, if /xo and /Xi are taken uniform in A and 5, Theorem 2 is really 
a refinement of Theorem 1. 



1.3. Limitations and open questions. A first limitation of these results is the 
necessity to take t — 1/2. This comes from the combinatorial nature of our proof, 
which, for the most basic situation K — 0, consists in building an injection from 
A x B into M x M. 

This can probably be circumvented if we assume that the sets A and B are convex 
(i.e. the midpoint of two points in A lies in A, and likewise for B): then, we can 
describe t-midpoints of A and B as iterated 1/2-midpoints. (If A or B are not convex, 
iterating only yields midpoints of several points in A and several points in B, which 
is not what we want.) 

The injection from Ax B into M x M used in our proof very naturally extends to 
an injection from Ax B into M t x M(i_ t ), with M t the set of t-midpoints. This leads 
to a lower bound for In j^M t + In #Mci-t) in terms of In j^A + In j^B plus a curvature 
term. This also holds in the Riemannian case (by adding the Brunn-Minkowski 
inequality for t and for (1 — t)). We do not know if there is a particular interpretation 
of this inequality. 

Our initial goal was to prove that the discrete hypercube has positive Ricci 
curvature in the sense of Lott, Sturm and the second author, i.e., that the hypercube 
satisfies displacement convexity of entropy (see below). The main difference with 
our result is that, in the Brunn-Minkowski inequality, we consider all midpoints of 
all pairs of points (a, b) with law /x <8> fJ>i', whereas for displacement convexity, one 
should first choose an optimal coupling between /xo and /Xi and then only consider 
the midpoints of those pairs (a, b) that make up the optimal coupling. The two 
properties coincide only when /i is a Dirac measure, in which case our result is 
related to Sturm's measure contraction property [Stu06]. 

So as far as we know, the problem of computing the Ricci curvature of the hypercube 
using the displacement convexity approach is still open. 



6 



Y. OLLIVIER AND C. VILLANI 




FIGURE 3. Coarse Ricci curvature in the hypercube. 



2. TWO APPROACHES TO DISCRETE RlCCI CURVATURE 

We now present in more detail the two known approaches for Ricci curvature on 
discrete spaces. This is not necessary to understand our results and proofs, but 
provides the original motivation. 

2.1. Coarse Ricci curvature (after the first author). The basic idea of coarse 
Ricci curvature is to take two small balls and compute the transportation distance 
between them. If this distance is smaller than the distance between the centers of 
the balls, then coarse Ricci curvature is positive. 

This is formalized as follows [O1107, O1109]. Let (X,d) be a metric space equipped 
with a measure fi. Let e be a discretization parameter (we take e — 1 for a graph) 
and assume that all e-balls in X have finite and non-zero measure. For x G X define 
the measure /i x by restricting \i to the closed e-ball around x: 

. = V\B(x,e) 

n(B{x,e)) 

with B(x,e) = {y G X,d{x,y) < e}. 

If x and y are two points in X, then the coarse Ricci curvature along (x, y) is the 
number n(x,y) defined by 

Wi(fi x , fiy) -■ (1 - k(x, y)) d(x, y) 

where W\ is the L 1 Wasserstein distance as defined earlier. If this is applied to a 
Riemannian manifold, this gives back the ordinary Ricci curvature when e — > 0, up 
to scaling by e 2 . 

Let us apply this to the discrete hypercube X = {0, 1}^ equipped with the uniform 
measure. The measure /j, x is uniform on the N + 1 neighbors of x (counting x itself). 
When x and y are neighbors, it is very easy to compute the curvature n(x,y), as 
illustrated on Figure 3. Indeed, we have to move the N + 1 neighbors of x to the 
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N + 1 neighbors of y; out of these N + 1 points, two are already in place (x and y 
themselves) and do not need to move, and the others have to move by a distance 1. 
So W^Hy) = 1 - 2/{N + 1) and K (x,y) = 2/{N + 1). 

If x and y are not neighbors, we use a locality property of coarse Ricci curvature. 
Namely, if the space X is 5-geodesic (i.e. if the distance between two points is realized 
by a sequence of points with jumps at most 5), then it is enough to compute K,(x,y) 
for d(x,y) < 5 (Exercise 2 in [O1107]). A graph is 1-geodesic by definition of the 
graph metric, so it is enough to work with neighbors. 

A lower bound on coarse Ricci curvature comes with a number of consequences 
[O1109]. For the discrete hypercube equipped with the uniform measure these proper- 
ties were already known (but not on the hypercube with e.g. Bernoulli(6'/A) measures 
[JO10]). 

In general, one may directly choose an arbitrary Markov kernel \l x (without using 
a global measure //); this leads to interesting applications [JO10]. 

2.2. Displacement convexity (after Lott, Sturm and the second author). In 

[RS05] (following ideas from [OV00]), Renesse and Sturm present a characterization 
of Ricci curvature on Riemannian manifolds, based on the idea that in positive 
curvature, "midpoints spread out". 

Let X be a smooth, complete Riemannian manifold. Let da; be the Riemannian 
volume measure on X. Given a probability measure /ion I, define its relative 
entropy as H(n\dx) : = J In ^ d/j, if the integral makes sense, or +oo otherwise. 

Let V 2 (X) be the set of probability measures on X with finite second moment, i.e. 
those probability measures ji such that f d(pt, x) 2 dfj,(x) < oo for some (hence any) 
point pt G X. On V 2 (X), the Wasserstein distance W 2 is well-defined. Moreover, 
V 2 (X) equipped with the metric W 2 is a geodesic space: given any two probability 
measures /i , /ii G V 2 (X), there exists a curve (fJ>t)te{0;i) i R V 2 (X) with W 2 (/it, A*t') = 
\t — t'\ W 2 (iJ>o, A*i) f° r t,t' G [0; 1]. Such a curve is called a displacement interpolation 
between /i and We refer to Chapter 7 of [Vil08] for more details. 

Theorem 1.1 in [RS05] asserts that the Riemannian manifold X has Ricci curvature 
at least K G R if and only if the following inequality is satisfied: for any two measures 
/xo, A*i G V 2 (X), for any Vy 2 -g e °desic (/J>t)te(0;i) joining them, we have 

H{n t \dx) < (1 - t)H(fi \dx) + tH(^\dx) - jt(l - t)W 2 (fx , /n) 2 , 

a property called displacement convexity of the entropy function. 

For any probability measure /i we have H(/i\dx) ^ — In vol Supp(/i), with equality 
when /i is uniform on its support. Taking /x and ^ to be uniform probability 
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distributions on sets A and A\ respectively, we see that displacement convexity of 
entropy implies an inequality between the logarithms of the volumes of the support 
of Ht, fio and fj,i. This inequality is very similar to the Brunn-Minkowski inequality 
mentioned earlier. Actually, an important property of displacement interpolation 
is that the measure /i t will charge only t-midpoints between the supports of ji Q and 
Hi (Corollary 7.22 in [Vil08], basically due to Brenier and McCann), and so the 
Brunn-Minkowski inequality in a Riemannian manifold really follows from convexity 
of entropy. 

Displacement convexity of entropy makes sense in an arbitrary geodesic space. In 
[Stu06, LV09], it is taken as the basis for a notion of Ricci curvature in such spaces. 
The definition depends on two parameters K (the curvature) and N (a "dimension"). 
Displacement convexity of entropy as written here corresponds to N = oo, the 
simplest and weakest case. 

Interestingly, this approach applies to spaces with positive curvature in the sense 
of Alexandrov [Pet]. 

Application to discrete spaces requires some changes: for instance, in the case of 
the hypercube considered in this article, clearly if two points are at odd distance 
they do not have an exact midpoint, but they have an approximate midpoint up 
to an error term ±1/2. Such an approach is used in [Bon09] to define the Brunn- 
Minkowski inequality on discrete spaces. In [BS09], Bonciocat and Sturm use 
approximate midpoints in the space of probability measures to extend the definition 
of displacement convexity of entropy to discrete spaces, and provide examples of 
planar graphs satisfying this property. To our knowledge, these planar graphs are 
the only discrete examples so far. 

3. Brunn-Minkowski inequality without curvature 

To make the idea clearer and introduce necessary concepts, we begin with a 
simplified version of Theorem 1, namely the same statement with K = 0. So let 
A, B be two nonempty subsets of the hypercube X = {0, 1}^. Let M be the set of 
midpoints of A and B. We want to prove that 

ln#M > - (ln#A + ln#B) 

or equivalently 

Let a = (a^is^iv £ A and b = (b^i^N £ B. A midpoint m = (mi) of a and b is a 
sequence of bits such that = Oj whenever = foj and such that half the remaining 
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bits coincide with those of a and the other half with those of b. Let r = d(a, b) be 
the number of distinct bits between a and b. For fixed a and b, there is a one-to-one 
correspondence between the midpoints to of a and b and the subsets c C {1, . . . , r} 
with cardinality r/2 (if r is even) orr/2±l/2 (r odd): among the r distinct bits 
between a and 6, the set c describes those picked from a in the construction of to. 

We shall call r-crossover such a c C {1, . . . , r} with |#c — r/2| < 1/2. We shall 
denote m = ip c (a,b) the midpoint of a and 6 defined by crossover c. If c is a crossover, 
we shall denote by c its complement, which is also a crossover. 

Note that, given a fixed d(a, 6)-crossover c, the pair $ c (a, 6) := (y? c (a, 6), (pc(o>, bj) = 
(to, to') allows to recover a and 6. Indeed, the identical bits in to and to' are the 
same as in a and 6; the bits that differ between to and to' also differ between a and 
b, and knowledge of the crossover c tells us exactly which of those come from a or b. 

In particular, for each r G {0, . . . , TV}, let us define the r-crossover c r := {1,2,..., \ r/2\ }. 
Then the map (a, 6) — > $ Cd(a 6) (a, &) is an injection from A x B to M x M where M 
is the set of midpoints of A and B. This proves that #(A x B) ^ #(M x M) as 
needed. 

For later use, let us state a property of the coding maps ip c and $ c . If $ c (a, 6) = 
(to, to'), we denote a = </?~ 1 (to, to') and 6 = x (to, to') = (/9 ( 7 1 (to / , to). 
Let us equip the set of crossovers C r with the distance 

rf(c,c') :=#( c \c') + #(c , \c). 

Proposition 3 (Decoding is isometric). Let to, to' G {0, 1}^. Let Ci,c 2 G Cd^m')- 
Let a\ = <f~^(m,m') and a 2 = (p~^(m,m'). Then d(ai,a 2 ) = d{c\,C2). 

Proof. Given to and to', modifying the crossover c changes the preimage </?~ 1 (to, to') 
by the same amount. □ 

4. CONCENTRATION IN THE SET OF CROSSOVERS 

To get an improved inequality with positive curvature K, we will need to study 
geometric properties of the set of crossovers; more precisely we show that this set 
exhibits concentration of measure. This is obtained from the well-known concentration 
of measure in the permutation group by a quotienting argument. (We refer to [LedOl] 
for more background about concentration of measure.) We first state concentration 
in the permutation group under the form we need. 

Lemma 4 (Concentration in S n ). Let S n be the permutation group on {1, . . . , n}. 
Equip S n with the distance d(a,a f ) = #{i,cr(i) ^ o-'(i)} for a, a' G S n . Let v be the 
uniform probability measure on S n . 
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Let f : S n — >■ R be a 1-Lipschitz function. Then f satisfies the concentration 
inequality 

"(if > Jfdv + t}) < e-* 2 /2(n-i) y t>0 
and the Laplace transform estimate 

Proof. The second statement is Proposition 6.1 in [BHT06]. The first statement 
follows by the exponential Markov inequality. □ 

Proposition 5 (The set of crossovers is concentrated). Let n > 1 and let C n be 

the set of parts c C {l,...,n} with |#c — n/2\ < 1. Equip C n with the distance 
d(c,c') : = #(c\c') + #(c'\c) as above and with the uniform probability measure fi. 

Let f : C n — >■ R fee a 1-Lipschitz function. Then f satisfies the concentration 
inequality 

fi({f^jfdfi + t})^e- t2 / 2n Vt^O 
and the Laplace transform estimate 

Proof. Let us begin with even n. Then the natural action of S n on {l,...,n} 
preserves C n . Let us fix an origin cq := {1, . . . , n/2} G C n and define the projection 
map 7r : S n — > C n by o~ i— >■ cr(co). Each fiber of 7r has the same cardinality ((n/2)!) 2 . 
Moreover, if we equip and C n with the distances as above, then the map 7r is 
1-Lipschitz. 

Thus, if / : C n — > R is a 1-Lipschitz function, the function / := / o n is 1-Lipschitz 
on S n . So / satisfies the concentration property v({ f ^ / / dv + t}) ^ e - * / 2 ( r_1 ) 
where v is the uniform probability measure on S n . Since all fibers of n have the same 
cardinality, n sends v to the uniform measure \i and so the same estimate holds for 
/ in C n under ji. The argument is identical for the Laplace transform estimate. 

For odd n we proceed as follows. Let us fix c = {1,..., |_ n /2j} G C n and 
ci = {1, . . . , [n/2]} G C n . Let us define the set S* := S n x {0} U5 B x {1}. Define 
the map it : S* — > C n by (u, i) h- >■ o"(q) for i = 0, 1. Then each fiber of 7r has the 
same cardinality |_ n /2_|! Let us equip S* with the metric d((cr,i), (a',i')) = 

\i — i'\ + d(a,a'). Then one checks that n is 1-Lipschitz from S* to C n . (A more 
elegant construction would have used c (->■ c to get a group structure on 5*, but this 
has bad metric properties.) 
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Given a 1-Lipschitz function / : C n — >■ R, consider as above the function / := f on 
on 5*. Applying, for instance, the technique of Theorem 4.2 in [LedOl] to get 
concentration of measure in S* instead of S n , we get that / satisfies the Laplace 
transform estimate 

J e Xj &V 4. e ^//d^+(^-l)A 2 /2+AV8 ^ g A f fdu+rX 2 12 

with v the uniform probability measure on S*. This implies that v({f ^ J f du+t}) ^ 
e -t /2r^ j ug ^. ag qJqqyq^ this estimate then holds for / on C n . □ 

Corollary 6. Let A be a subset of the set of crossovers C n and let A := {c, c G A}. 
Suppose that d(A,A) ^ k. Then 

#A^e- fc2 / 8 "#C n . 

Proof. Consider the function / : C n — > M. given by f(c) := \ (d(c, A) - d(c, A)) . This 
function is 1-Lipschitz, and takes values at least k/2 on A. By symmetry the average 
of / is 0. So applying the above, we get that the (relative) measure of A in C n is at 
most e - fc2 / 8 ". □ 

The following is a refined version of Corollary 1 6, in which the set A is replaced 
with a measure £, cardinals are replaced with entropies, and the distance d(A, A) is 
replaced with Wi (£,£). 

Corollary 7. Let £ fee a probability measure on the set of crossovers C n . Let £ be 
the complement of £ ie. £(c) := £(c) /or c G C n . TTien 

5(0<ln#C7„-^Wi(e,S) 2 
mt/i 5" £/ie Shannon entropy. 

Proof. The proof uses the following consequence of Proposition 5. 

Lemma 8 (W\H inequality for crossovers). Let £ be a probability measure on C n . 
Then 

W 1 (Z,(i) 2 4 2nH(Z\fi) 
where \x is the uniform probability measure on C n and H the relative entropy. 

Indeed, by a result of Bobkov and Gotze (Theorem 3.1 in [BG99]), the inequality 
^MC?^) 2 ^ ^lH{$\v) for all measures £, is equivalent to the Laplace transform 
estimate / e Xf dfi < e A//d M +7A 2 /2 f or a ll A G R and all 1-Lipschitz functions /. So the 
lemma is actually equivalent to Proposition 5. 
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Now, since W\(£,£) ^ Wi (£,//) + Wi([A,£) = 2Wi(£,/j) by symmetry, we get 

Finally, using H(^\/i) = In #C n — S(£), this rewrites in terms of the Shannon entropy 
as 

S(0^1n#C n -^i(£,0 2 . 

□ 

5. Positively curved Brunn-Minkowski inequality 

Let us now prove Theorem 1. So let again A,B be two nonempty subsets of the 
hypercube X = {0, 1}^, and let M be the set of midpoints of A and B. We have to 
prove that 

In #M > 1 (In #A + In #5) + = ^ . 

The difference with the case K = is that we now consider all crossovers at once. 
Let C r be the set of r-crossovers. Let Y :— {(a,b,c), a G A, b G B, c G Cd( a ,b)}- 
Consider the map / : (a, b, c) \-¥ $ c (a, 6) from F to M x M. This map / may not be 
one-to-one; but we will show that it is not too-many-to-one. The idea is that, given a 
pair of midpoints (m,m'), the geometry of A and B allows to guess, to some extent, 
which crossover was used, so that the cardinality of f~ l {m,m!) is bounded. (This is 
most clear when A is a singleton {00 ... 00}, in which case there is no ambiguity on 
the crossover: every T' in m or m' was taken from B.) 

Let Y r := {(a,b,c) G Y, d(a,b) = r} and let likewise (M x M) r := {(m,m') G 
M x M, d(m,m') = r}. Now fix (m,m') G (M x M) r . The fiber m') is 

in bijection with the set E of crossovers c G C r such that $7 1 (m, m') G A x £>. 
Consider, symmetrically, the set £" = {c G C r , ^^(to, m') G B x A}. By definition 
$ c = (tp c ,tfc), so the elements of £" are the complements of the elements of E. 

We claim that d(E, E') > d(A, B). Indeed, if c G E, d G £" we have (p~*(m, w') G 
^4 and 93~ 1 (m, m!) G .B. Since decoding is isometric (Proposition 3) we have d(c, d) ^ 
d(A,B). 

Corollary 6 then states that the cardinality of E is at most j^C r e~ d ^ A,B ^> / 8r . Since 
the cardinality of E is also the cardinality of the fiber / _1 (m, m'), this shows that 
the map / : Y r — > (M x M) r is at most (#C r e~ d ( A ' B ) 2 / 8r )-to-one. Consequently, 

#r r < #C r e- d ( A ' B 1 2 / 8r #(M x M) r . 
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Setting (AxB) r := {(a, b) e AxB, d(a,b) = r}, we have #Y r = #(A x B) r x#C r 
so that 

#(M x M) r ^ e d(A ' B)2/8r #(A x B) r . 
Finally, summing over r from 1 to N we find 

#(M x M) > e d ( A ^ 2 / m #(A x B) 

which proves Theorem 1. 

6. Entropy of the set of midpoints 
We now turn to the proof of Theorem 2. 

Remember that, given a and b in the hypercube X, the midpoint measure mid(a, b) 
is the uniform probability measure on all midpoints of a and b. The midpoint measure 
of two probability measures /ia and /i B is defined as 

mid(fj,A, I^b) '■— J J mid(a,b) dfj,A{a)dfj,B(b) 

that is, the average of mid(a,b) where a and b are taken independently at random 
under /ia and /i B . 

The proof follows the same lines as in the deterministic case, using probability 
measures instead of sets. The reader should think of the probability measures below 
as being nothing but weighted sets, and their Shannon entropy as being the logarithm 
of their cardinality. The main differences are as follows: 

• In the set-theoretic version, a key point was an estimation of the cardinality of 
the fibers of the map (a, b, c) >->■ (m,m') = $ c (a, b). The lower bound on the 
cardinality of the set {(m,m')} followed. Here, we will use the associativity 
of Shannon entropy to express the same relationship, yielding a lower bound 
on the entropy of (to, to') if the entropy of the fibers is known. 

• The final result involves Wi(ha, A*_b) instead of d(A,B). In the set-theoretic 
version, we used the map c 4 c and the fact that $ c (a,6) = $ E (6, a) to 
conclude that, if $ c (a, b) = $ c '(a', b') then d(c, d) = d(b, a') > d(A, B). Then 
Corollary 6 was used to bound the cardinality of the set E of such crossovers 
c in a fiber. The refined version uses the relation d(c, d) = d(b, a') to turn 
any coupling between E and E, into a coupling between A and B with the 
same transportation distance. Then, Corollary 7 is used as a refined version 
of Corollary 6 and yields a bound on the entropy of the crossovers c in a fiber. 

So let a and b be independent random variables with law ^a and \xb- Let as above 
C r be the set of r-crossovers. Let c be a random variable uniformly distributed on 
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Cd( a ,b), independent of a and b conditionally to d(a,b). Let us define the random 
variables to := ip c (a, b) and m! := (pc(a, b). Thus the law of to is mid(/iA, I^b), as is 
the law of to/. 

Let us slightly abuse notation and denote by S((y)) the Shannon entropy of the 
law of a random variable y. We have S((m,m f )) ^ S((m)) + S((m f )) but since to 
and mf have the same law mid(/iA, I^b), we get 

S(mid(fj, A , Hb)) > -S((m,m')). 

Consider as above the map $ sending (a,b,c) to <J> c (a, b) = (m,m f ). Let Y( m ,m') 
be the law of (a, 6, c) knowing (to, to/). By the associativity of entropy, the Shannon 
entropy of the law of (m, to/) is the entropy of the law of (a, 6, c) minus the average 
entropy of fibers of $, namely: 

S((to,to')) = 5((a,6,c)) -E5(y (m , m0 ). 

The first term is computed as follows. The random variables a and b are indepen- 
dent, and, conditionally to d(a,b), the variable c is independent of a and b with law 
the uniform distribution Ud( a ,b) on CWqM. So 

5((o, 6, c)) = 5((a)) + S{{b)) + ES(C/ d(a , 6) ) = S(/i A ) + S(fi B ) + E In #C d(a , b) . 

Let us turn to the second term KS (Yr mtTn n) . This means we have to evaluate the 
entropy of the fibers of $, as in the non-random case. 

Let i?( mm /) be the law of c knowing (m,m') (i.e., the third marginal of F( m)jn /)). 
Given (to, to/), the value of c determines a and b, and so, ^((a, b, c)\(m, to')) = 
S((c)|(to,to')) i.e. 

S{Y{rn,m')) = S{E^ m ,)) 

so that 

S((m,m')) = S(fi A ) + S(fi B ) +E\n#C d(a , b) - ES(£ (m , m0 ). 

If, at this point, we apply the crude estimate S(E^ m ^) ^ In #Cd(m,m') j we get 
5((to, m'))^S {ha) + S(fj, B ) + E In #C d(a:b) - E In #C d (m,m') = 5 (//a) + S{(j, b ) since 
d(a,b) = d{m,m'). This implies 5 ((to)) ^ K^/x^) + S(/ib)) i- e - the case K = in 
the theorem. 

As in the set-theoretic case, we will show that -E( m , m ') has small Shannon entropy 
by using concentration properties in the set of crossovers. Corollary 7 tells us that 

5'(-E'( mim ')) < In #Cd{ m ,m') — TTf, W W\ (E( mjTn i) , E^m') ) 2 
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where -E( m , m ') is the image of E( m>rn i) by c K c. Thus, we need to evaluate the 
distance between -E( m , m ') and -E( m , m '), as in the deterministic case. 

Actually we only need an estimate on average over {m,m'). We claim that 

EW 1 (E (m>m/) , E (miml) ) 2 > Wx^a^b) 2 - 

Indeed, let us fix (m,m') for now, and let ^4( m , m ') and S( m>m /) be the laws of a 
and b knowing (m, m'), respectively. Since a = Lp~ l {m,m') and b = Lp^im.m'), any 
coupling between -EV mjm ') and Et m , m i\ determines a coupling between Ai m ^ m i\ and 
B^ m iy Moreover, since decoding is isometric by Proposition 3, these couplings 
will define the same transportation distance. So we get W 1 ( A( m>m /) , -B( mjm /) ) ^ 

Wl (E(m im >) , -E( m ,m') ) • 

If for each (m,m') we are given a coupling between A( m m /) and B^ m>m iy by 
summation this defines a coupling between /i^ and fi B and so W\(fiA, ^b) ^ 
EWi(j4( m>m /),.B( m)m #)). Thus < EWi (S( m , m /) , 22( m , m ')). Then, by con- 

vexity we get 

W"i(/iA,/is) 2 < Elf^^),^)) 2 

as announced. 

Putting everything together and using that d(m,m') = d(a,b), we get 
S((m,m')) = S({a,b,c))-ES(Y {mtml) ) 

= S(fjL A ) + S{fi B ) + E In #C d(a , b) - ES(£ (m , m0 ) 

^(-^(m.m')? ^(m.m'))" 



,2 



and so 



^ + 5(// B ) + E In #C d(a , 6) - E In #C d(m , m0 + E 

> + 5(// B ) + — EWi(£ , ( mjm /), £ (m>m0 

^ S(jm a ) + 5(// B ) + g^^i^A, M 2 



8d(m, m') 



which ends the proof. 
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